The first homework on descriptive statistics and probability
Yakub Rabiutheen
September 20, 2022
Question 1
The data in the file UN11 contains several variables, including ppgdp, the gross national product per person in U.S. dollars, and fertility, the birth rate per 1000 females, both from the year 2009. The data are for 199 localities, mostly UN member countries, but also other areas such as Hong Kong that are not independent countries. The data were collected from the United Nations (2011). We will study the dependence of fertility on ppgdp.
##load datadata(UN11)
Qn 1.1.1
The predictor is ppgdp and the response is fertility.
# Qn 1.1.1 Standard Scatterplotlibrary(ggplot2)ggplot(data = UN11, aes(x=ppgdp,y=fertility)) +geom_point()
Annual income, in dollars, is an explanatory variable in a regression analysis. For a British version of the report on the analysis, all responses are converted to British pounds sterling (1 pound equals about 1.33 dollars, as of 2016).
To convert from USD to GBP, the value of the response must be divided by 1.33. Same goes for the slope.
How, if at all, does the correlation change?
Pearson's product-moment correlation
data: usdollar and pound
t = 189812531, df = 8, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
1 1
sample estimates:
Currency Changes do not affect correlation.
Water runoff in the Sierras (Data file: water in alr4) Can Southern California’s water supply in future years be predicted from past data? One factor affecting water availability is stream runoff. If runoff could be predicted, engineers, planners, and policy makers could do their jobs more efficiently. The data file contains 43 years’ worth of precipitation measurements taken at six sites in the Sierra Nevada mountains (labeled APMAM, APSAB, APSLAKE, OPBPC, OPRC, and OPSLAKE) and stream runoff volume at a site near Bishop, California, labeled BSAAM. Draw the scatterplot matrix for these data and summarize the information available from these plots. (Hint: Use the pairs() function.)
pairs(water_supply,main ="Sierra Southern California Water Supply Runoff",pch =21, bg ="green")
Error in pairs(water_supply, main = "Sierra Southern California Water Supply Runoff", : object 'water_supply' not found
pi re hi tv
very liberal : 8 never :15 Min. :2.000 Min. : 0.000
liberal :24 occasionally:29 1st Qu.:3.000 1st Qu.: 3.000
slightly liberal : 6 most weeks : 7 Median :3.350 Median : 6.000
moderate :10 every week : 9 Mean :3.308 Mean : 7.267
slightly conservative: 6 3rd Qu.:3.625 3rd Qu.:10.000
conservative : 4 Max. :4.000 Max. :37.000
very conservative : 2
Source Code
